This report will outline the process for sample design and selection for a sample of census tracts, block groups, and persons from Prince George’s County, MD. This sample was designed to allow for estimates of the proportion of persons in different age groups who have civic awareness. Civic awareness will be measured by asking respondents questions about the name of their district representative in the U.S. House of Representatives, the name of their local delegate to the Maryland house of Delegates, and other indicators.
A three-stage cluster sample was drawn, with a probability proportional to size (PPS) sample of 15 primary sampling units (PSUs), a PPS sample of 1 secondary sampling unit (SSU) within each PSU, and a simple random sample (SRS) of elements within each SSU. Given that the goal of this study is to measure civic awareness within certain age domains, a composite measure of size was used that accounted for the prevalence of certain age groups within clusters. Using this method of selection should ensure that a targeted number of respondents per age group will be achieved in the final sample.
First, this report will explain the overall sample design and the method of assigning measure of size to PSUs and SSUs. Next, we will describe the method of sample selection and the units that were selected. Lastly, we will discuss the precision of estimates that can be anticipated from this sample, and the process for correctly measuring the variance of estimates in the achieved sample.
Steps: a. Compute composite MOS and selection probability for each PSU and SSU b. Quality checks c. Combining uneligible/feasible SSU d. Repeat step a and construct the sample frame
Objectives: 1.self-weighting 2.equal workload in each PSU
Tracts as PSU; Block groups as SSU; Persons as elements. Domains(d):Age Groups
##Population Information
The purposes of the composite MOS are to get: 1. Self-weighting samples from each of several domains 2. Equal workload in each PSU, i.e., same total sample size in each PSU (across all domains) 3. PSU selection probabilities that give “credit” for containing domains that are relatively rare in the population.
##Calculate composite MOS for Tract and BG (Textbook 10.5) ### See Excel “MOS Tract” & “MOS BG”
The composite MOS for PSU(Tract) is defined to be (10.1) \(S_i=\sum_{D}^{d=1} f_dN_{ij}(d)\\\) The total composite MOS for each PSU(Tract)can be written as \(S=\sum_{i\epsilon U}\sum_{D}^{d=1} f_dN_{i}(d)\\\) =\(\sum_{D}^{d=1} f_d\sum_{i\epsilon U}N_{i}(d)\\\) The selection probability within each PSU(Tract) \(\pi_i=q*({f_d/S_i})\) The desired sample of each PSU(Tract)can be calculated by \(\pi_i*N_i\)
The composite MOS for SSU (BG) is defined to be (10.1) \(S_i=\sum_{D}^{d=1} f_dN_{ij}(d)\\\) The total composite MOS for each SSU (BG) can be written as \(S_{i+}=\sum_{i\epsilon U}\sum_{D}^{d=1} f_dN_{i+}(d)\\\) =\(\sum_{D}^{d=1} f_d\sum_{i\epsilon U}N_{i+}(d)\\\) The selection probability within each SSU (BG) \(\pi_i\pi_{j|i}=q*({S_{i+}*S_{ij})/({S_{++}*S_{i+}}})\) The desired sample of each SSU (BG) can be calculated by \(\pi_i\pi_{j|i}*N_{ij}\)
##Check Feasibility of each Tract and BG (Textbook P286) \(f(d)\leq S(i)/\bar{n}\)
There is one tract and several BGs do not meet the criteria above–Need to check the map and decide whether to drop or combine the BGs (the unfeasible tract actually contains unfeasible BGs, so it might be okay to just deal with the issue in BG-level)
##Then, recalculate the selection probability and desired sample size in each PSU and SSU; Construct weights
##Quality Control Checks (Textbook 11.1) \(q^*_{ij}(d)\leq Q_{ij}(d)\) \(\bar{\bar{q}}\leq Q_{ij}\) for each SSU \(\bar{n}\bar{\bar{q}}\leq Q_i\) for each PSU \(\pi_i\),\(\pi_{j|i}\),\(\pi_{k|ij}\) less or equal to 1
Note: There are 13 tracts that only have one BG—- we are not using Sampford’s method.
However, notice that this sample design of selecting tracts first, followed by a single BG per tract, is not the same as selecting BG’s directly. If we selected BG’s directly using Sampford’s method, all pairs of BG’s would have non-zero joint selection probabilities. Since we select tracts and then 1 BG per tract, the joint selection probability of any two BG’s in a given tract is zero.
The fact that only 1 BG is selected per tract might raise the question of whether variances can be estimated with this design. We can still estimate design-variances because the number of first-stage units is 15, the number of sample tracts. See Textbook 9.2.1